A Deep Linguistic Analysis for Cross-language Information Retrieval

نویسندگان

  • Nasredine Semmar
  • Meriama Laïb
  • Christian Fluhr
چکیده

Cross-language information retrieval consists in providing a query in one language and searching documents in one or different languages. These documents are ordered by the probability of being relevant to the user's request. The highest ranked document is considered to be the most likely relevant document. The LIC2M cross-language information retrieval system is a weighted Boolean search engine based on a deep linguistic analysis of the query and the documents. This system is composed of a linguistic analyzer, a statistic analyzer, a reformulator, a comparator and a search engine. The linguistic analysis processes both documents to be indexed and queries to extract concepts representing their content. This analysis includes a morphological analysis, a part-of-speech tagging and a syntactic analysis. In this paper, we present the deep linguistic analysis used in the LIC2M cross-lingual search engine and we will particularly focus on the impact of the syntactic analysis on the retrieval effectiveness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic to French Sentence Alignment: Exploration of A Cross-language Information Retrieval Approach

Sentence alignment consists in estimating which sentence or sentences in the source language correspond with which sentence or sentences in a target language. We present in this paper a new approach to aligning sentences from a parallel corpus based on a cross-language information retrieval system. This approach consists in building a database of sentences of the target text and considering eac...

متن کامل

LWA 2006 Proceedings

Information on the internet is a vast resource for question answering. As the amount of available information from web pages increases, novel methods for finding precise answers to user queries and questions must be found. Standard information retrieval methods are efficient, but often fail to provide a user with short, precise answers. A deep linguistic analysis of all information is time cons...

متن کامل

University of Hagen at CLEF 2005: Towards a Better Baseline for NLP Methods in Domain-Specific Information Retrieval

The third participation of the University of Hagen at the German Indexing and Retrieval Test (GIRT) task of the Cross Language Evaluation Campaign (CLEF 2005) aims at providing a better baseline for experiments with natural language processing (NLP) methods in domainspecific information retrieval (IR). Our monolingual experiments with the German document collection are based on a setup combinin...

متن کامل

Semantic annotation for concept-based cross-language medical information retrieval

We present a framework for concept-based cross-language information retrieval in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data. Documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes part-of-speech ...

متن کامل

Learning Semantics with Deep Belief Network for Cross-Language Information Retrieval

This paper introduces a cross-language information retrieval (CLIR) framework that combines the state-of-the-art keyword-based approach with a latent semantic-based retrieval model. To capture and analyze the hidden semantics in cross-lingual settings, we construct latent semantic models that map text in different languages into a shared semantic space. Our proposed framework consists of deep b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006